-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Add metric for time spent casting in native scan #919
Conversation
def scanMetrics(sc: SparkContext): Map[String, SQLMetric] = { | ||
Map( | ||
"cast_time" -> | ||
SQLMetrics.createNanoTimingMetric(sc, "Total time for casting arrays")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"arrays" in "casting arrays" may be confused with Spark array type and considered as time spent on casting (Spark) "arrays".
Maybe "casting columns"?
@@ -345,16 +347,20 @@ struct ScanStream<'a> { | |||
baseline_metrics: BaselineMetrics, | |||
/// Cast options | |||
cast_options: CastOptions<'a>, | |||
/// Timer for cast operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this elapsed time?
@@ -198,9 +198,12 @@ case class CometScanExec( | |||
// Tracking scan time has overhead, we can't afford to do it for each row, and can only do | |||
// it for each batch. | |||
if (supportsColumnar) { | |||
Some("scanTime" -> SQLMetrics.createNanoTimingMetric(sparkContext, "scan time")) | |||
Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be it a Map instead of Seq?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @andygrove
Which issue does this PR close?
N/A
Rationale for this change
Make it easy to see how much of the native
ScanExec
time is spent casting columns to different types (this usually means unpacking dictionaries).Example from TPC-DS q9:
DataFusion metrics in native explain output:
Full plan:
What changes are included in this PR?
How are these changes tested?